Nov." Sf ere 


DEC7 1951 
<2, 


- The College Board Review 


NUMBER 15 NEW YORK, N.Y. NOVEMBER 1951 


THE EXPERIMENT 
IN GENERAL COMPOSITION. ..... .. . Earle G. Eley 


A NEW VENTURE 
IN THE TESTING OF MOTIVATION. . . 9. . John V. Gilmore 


SCIENCE TEACHING 
AND THE BOARD’S SCIENCE TESTS. . . .§. . Paul F. Brandwein 


THE SCHOLASTIC APTITUDE TEST— 


ITEMS, SCORES, AND COACHING. . . . .§. . Henry S. Dyer 


Also in this issue 

Punch cards on candidates available West Coast conference held 

“Who Should Go to College” book Transfer Test use increases 

Board Handbook and Annual Report issued Examination dates, 1952-53 
New college and association members Editor added to staff 


Executive Committee members, Elected officers and Executive 
representatives chosen Committee of the Board 


Special score report fee reduced Publications, dates, tests, fees 











THE COLLEGE BOARD REVIEW 


News and Research of the 
College Entrance Examination Board 


Published three times a year by the 
College Entrance Examination Board 
425 West 117th Street, New York 27, N. Y. 


Frank H. Bowles 
William C. Fels 


Director . 


Secretary 


IBM punched card service 
for colleges announced 


In order to encourage research on ad- 
missions criteria, the Board has announced 
that it will make available to the member 
colleges duplicate sets of punched IBM cards 
for their candidates. This service will begin 
with the December, 1951, series of tests. 
The master card is punched with a candi- 
date’s name, examination number, and test 
scores, together with coded information in- 
dicating whether he is a preliminary or final 
candidate, what test program he has taken 
(morning, afternoon, or both), the area of 
his residence, the type and location of his 
secondary school, and the college to which 
the duplicate card will be sent. The charge 
for the service has been tentatively set by 
the Director at $25 per year for up to 1000 
candidates, plus $1 for each additional 100 
candidates or part thereof. 


Commission to publish book 
on who should go to college 


“Who Should Go to College in America” is 
the title of a new book by Byron Hollins- 
head to be published soon by Columbia 





University Press for the Commission on 
Financing Higher Education. The book is 
the outcome of a study conducted under the 
auspices of the College Entrance Exami- 
nation Board. 

The new book deals with such important 
questions as “Who now goes to college?”; 
“What determines their going?”; “Who 
should go among those not now going?”; 
“What measures are necessary to get more 
of the high-ability group to go?” and “How 
can we arrange this and what would it 
cost?” 

Dr. Hollinshead has been President of 
Coe College and Keystone Junior College 
and was connected with the General Edu- 
cation Board. He was one of the authors of 
the Harvard report, “General Education in 
a Free Society.” 


New College Handbook and 


Annual Report issued 
The Board’s new College Handbook, an 


enlarged and completely rewritten version 
of its well-known Terms of Admission to the 
Member Colleges, was distributed free to 
6000 colleges and schools during October. 
The Board will also send free copies for the 
first time to preliminary (junior) candi- 
dates. Approximately 12,000 preliminary 
candidates are expected to take the Board’s 
tests this year. This distribution will begin 
in January. The purchase price of single 
copies has been reduced from $1.50 to $1.00. 
The free distribution to juniors and the re- 
duction in price were authorized by the 
Executive Committee in the interests of 
earlier and better college guidance. 

The Fiftieth Annual Report of the Di- 
rector was also distributed during October. 
It includes, as did the Forty-ninth Report, 


[ 214 ] 





a section on “Data for Interpreting the 
Tests,” prepared by Dean Henry S. Dyer, 
Director of the Office of Tests of Harvard 
University. 


Colleges and associations 
admitted to Board 


Ten colleges and two associations were ad- 
mitted to membership in the College En- 
trance Examination Board at its October 
meeting. With the addition of the new 
members, the rolls of the Board now 
number 134 colleges and 23 associations. 
Each college is entitled to one voting and 
one non-voting representative. Secondary 
school associations are entitled to from one 
to five voting representatives, depending on 
the nature and scope of the association. 
The new colleges and associations are: 


Albertus Magnus College Newark College of Engineering 
College of Mount Saint Vincent Ohio Wesleyan University 
College of Wooster Reed College 

Emory University Rensselaer Polytechnic Institute 
Lewis and Clark College Seton Hill College 


High School Principals Association of New York City 
Jesuit Educational Association 


Fee for special score reports 
to colleges reduced 


The fee charged candidates for late or extra 
score reports to colleges has been reduced. 
The Board will continue to send score re- 
ports without charge to one, two, or three 
colleges named on the application blank or 
requested before the close of final regis- 
tration. For additional reports, or reports 
requested after the closing of registration, 
the charge will be $1 for each one, two, or 
three reports requested in the same com- 
munication. The former fee was $1 for each 
late or extra report requested. 


[ 215 ] 


Executive Committee members, 
representatives elected 


Dr. William H. Cornog, Dr. Finla G. Craw- 
ford, and Dean Frank J. Gilliam were 
elected to the College Board’s Executive 
Committee at the Board’s October 31 meet- 
ing. Mr. John Irvine Kirkpatrick was 
elected Custodian. They will serve for three 
years. 

Dr. Cornog is President of Central High 
School in Philadelphia and Chairman of the 
Board’s Committee on Examinations. Dr. 
Crawford is Vice-Chancellor of Syracuse 
University. Mr. Gilliam is Dean of Students 
at Washington and Lee University. The 
newly elected Custodian, Mr. Kirkpatrick, 
left Lehigh University recently to become 
Comptroller at the University of Chicago. 

The retiring members of the Executive 
Committee, who completed their terms in 
October and were not eligible for reelection, 
are Professor Francis L. Bacon of the Uni- 
versity of California, Dr. Lemuel R. John- 
ston of Clifford J. Scott High School, East 
Orange, New Jersey, and Vice-President E. 
Kenneth Smiley of Lehigh University. 
President James H. Case, Jr. of Bard Col- 
lege is the retiring Custodian. 

The Board also elected Dr. Cornog and 
four others to be secondary school repre- 
sentatives-at-large for a term of three years. 
The others are Miss Allegra Maynard, 
Madeira School; Dr. Morris Meister, Bronx 
High School of Science; Mr. Howard L. 
Rubendall, Mount Hermon School; and Mr. 
Russell H. Rupp, Shaker Heights High 
School, Cleveland, Ohio. 

Current elected officers and members of 
the Executive Committee of the Board are 
listed on page 239 of this issue of the 
Review. 











West Coast regional conference 
to be held at Santa Barbara 


College Board representatives on the West 
Coast will hold the third in a series of 
regional meetings at Santa Barbara, Cali- 
fornia, on November 14. The two earlier 
regional meetings were held at Evanston, 
Illinois, and Roanoke, Virginia. 

The meeting, which will be held in con- 
nection with the convention of the Pacific 
Coast Association of Collegiate Registrars, 
will be at the Mar Monte Hotel. Following 
a luncheon to which representatives of state 
education authorities, secondary schools, 
and non-Board colleges have been invited, 
the group will discuss admissions problems 
peculiar to the West Coast. Frank H. 
Bowles, Director of the Board, will address 
the luncheon. 


College Transfer Test use 
increased in 1950-51 


Registration for the College Transfer Test 
and experimental administrations in Cali- 
fornia and elsewhere provide evidence of 
increasing interest in the new test. Over 
1300 candidates attended the five adminis- 
trations of the College Transfer Test in 
1950-51. This represents a 51 per cent gain 
over 1949-50, the year of its introduction, 
when it was administered only twice. At 
the most recent series, in August, 1951, 
there were 281 candidates as compared to 
go in August, 1950. 

During May, 1951, the Intermediate 
Tests for College Students, including the 
College Transfer Test and the Proficiency 
Tests (the latter being available only by 


special arrangement), were administered to 


approximately 2000 California junior col- 
lege and University of California lower- 
division students who were planning to 
transfer into the junior year of the Uni- 
versity. A validation study is planned. This 
experimental administration was super- 
vised by the Pacific Coast Office of the 
Educational Testing Service and supported 
by the University of California. 

On a smaller scale was the administration 
of the crt by Davidson College in April, 
1951, to a randomly selected group of 50 of 
their own end-of-year sophomores in order 
to establish local standards against which 
to judge the potentiality of their transfer 
applicants. 


Examination dates for 1952-53 


At its October meeting, the College Board 
selected the testing dates for the academic 
year 1952-53. The new dates and the com- 
parable dates for 1951-52 are: 

1951-52 
December 1, 1951 
January 12, 1952 
March 15, 1952 
May 17, 1952 
August 13, 1952 


1052-53 
December 6, 1952 
January 10, 1953 
March 14, 1953 
May 16, 1953 
August 12, 1953 


Board adds editor to staff 
On November 1, S. Donald Karl joined the 


staff of the College Entrance Examination 
Board in the newly created position of 
Editor. In his new position, Mr. Karl will 
be responsible for all the Board’s publi- 
cations, including the Review, which for the 
past three years have been edited by the 
Secretary of the Board, William C. Fels. 
Until his appointment to the Board’s staff, 
Mr. Karl was editor of the Colwmbia 
Alumni News. 


[ 216 ] 








The Experiment in General Composition 
Earle G. Eley 


Earle G. Eley is Examiner in Humanities at the 
University of Chicago and a research consultant 


to the College Entrance Examination Board. 


Whether it is possible to evaluate writing by 
means other than having a student write is a 
question which has been much debated in the 
past. During the Thirties and early Forties, 
when machine-scored tests were being de- 
veloped to measure all kinds of skills, abilities, 
aptitudes, and attitudes, it was only natural 
that objective tests should have been devised 
which sought to measure writing competence. 
These objective tests carried the day: they com- 
bined an appearance of extreme preciseness 
with economy and reliability of grading. Even 
some of those who had worked long with the 
essay-type examination were finally convinced 
that skillfully contrived objective instruments 
were superior to the essay as measures of 
writing competence. Professor John Stalnaker, 
who had worked with the College Board essay 
tests ever since 1936, and who had seen the 
reading reliability of the College Board English 
essay rise to an unheard of .88 in 1939, is quoted 
by Dr. Claude M. Fuess, in his The College 
Board: Its First Fifty Years, as writing in 1943 





GCT plans for 1951-52 


The General Composition Test experiment will 
be continued this year as a result of action taken 
at the Board’s October 31 meeting. Emphasis 
will be placed on validation of the test. A form 
of the test will be made available to schools for 
their own purposes this spring. Details will 
be announced in the mid-winter issue of the 
Review, now scheduled, for February. 


—two years after the essay test had been dis- 
continued by the Board: 

The type of test so highly valued by teachers of 
English, which required the candidate to write a 
theme or essay, is not a worth-while testing device. 
Whether or not the writing of essays as a means of 
teaching writing deserves the place it has in the 
secondary school curriculum may be equally ques- 
tioned. Eventually, it is hoped, ‘sufficient evidence 
may be accumulated to outlaw forever the “write-a- 
theme-on” type of examination. 


Despite the enthusiasm for a new type of test 
evidenced in the above quotation, the feeling 
persisted among teachers of English, and among 
others directly concerned with the school cur- 
riculum, that the ability to write is a major ob- 
jective of American education, which can be 
satisfactorily evaluated only by the direct 
method of having the candidate write some kind 
of essay. Advocates of objective English tests 
have been unable to produce convincing evi- 
dence that such tests constitute a valid measure 
of the ability to write creatively. The educa- 
tional objectives which such tests might be said 
to measure can be stated somewhat like this: the 
ability to recognize grammatically incorrect 
constructions, the ability to select from several 
alternatives a correct construction to fit a 
context, the ability to rearrange previously con- 
cocted sentences into a well-organized para- 
graph. It cannot be said that the ability to 
write is one of these objectives. The case for the 
objective test must rest on the proposition that 
“recognition” abilities are directly associated 
with writing. 

The experiment briefly reported here was 
carried on during the past year by the College 
Entrance Examination Board and twenty-nine 
cooperating schools, both public and indepen- 
dent. It was designed to test whether the ability 
to synthesize personal and educational ex- 
perience and to apply this synthesis to the 


[ 217 ] 











written solution of a problem, in short, to write 
creatively, could be directly measured by means 
of an improved essay test. The General Com- 
position Test was conceived neither as a return 
to the traditional College Board essay exami- 
nation, nor as a potential substitute for ob- 
jective tests. It was felt that the abilities that 
make up general writing competence are at 
present inadequately measured and that a 
supplement to objective instruments is needed. 
We sought to develop an essay test which would 
avoid the most common objections to essays: 
(1) that what an essay test measures is vague, 
and that knowledge of content.and writing 
skills cannot be separated; (2) that essay tests, 
by their very nature, must be based on subject- 
matter content and hence tend to dictate 
secondary school curriculum; and (3) that 
essays cannot be read reliably. 


GRADED FOR FIVE QUALITIES 

We sought to meet the first of these objections 
by having the essay books graded for five 
separate qualities: Mechanics, Style, Organi- 
zation, Reasoning, and Content. Thus Content 
could be separated from other writing skills, 
and we were, in addition, able to say with some 
accuracy just what we were attempting to 
measure. The second of these objections we 
sought to meet by providing background read- 
ing materials to accompany the essay problem. 
Thus, the test did not depend upon any specific 
subject matter in the curriculum, but instead 
was designed to minimize subject-matter differ- 
ences in educational background among candi- 
dates: we attempted to evaluate the general 
objectives which relate to the ability to write 
rather than any content in the curriculum. Our 
procedure for meeting the third of these ob- 
jections will be discussed later. 

However, the structuring of the essay prob- 
lem with reading materials and the scoring of 
the essays according to five qualities were 
adopted for more positive reasons than merely 
to answer objections against essay tests. We 
felt that the reading materials, by minimizing 


MEAN GRADES—FORMS A AND A-I 


Number 
Me- Organi- Rea- Con- of 
chanics Style zation soning tent Students 
Form A 235 2357 @a0 357 2.40 350 
Form A-I 2.33 ~—2.71 2.69 289 2.61 100 
Difference .08 ac. 2S Ue CO? 


* Significant to the § per cent level. 
» Significant to the 1 per cent level. 
& I 


differences in the subject-matter backgrounds 
of candidates, would render the test a “purer” 
measure of writing competence. Our hypothesis 
was that this would be evidenced in a higher 
general level of achievement and a more ade- 
quate distribution of scores than would be the 
case if reading materials were not present. We 
felt, further, that supplying reading materials 
would make it possible for the less able candi- 
dates to “start” an essay and, as a result, pro- 
vide evidence with respect to their writing 
competence while, without the stimulus of such 
materials, they might not be able to write an 
essay at all. 

In order to test these hypotheses, each of the 
two experimental forms, called A and D, was 
accompanied by a control form, respectively 
A-1 and D-1, which posed the identical essay 
problem but omitted the background readings. 
The two problems chosen were: “Should Women 
be Given Identical Educational and Pro- 
fessional Opportunities with Men?” (Forms A 
and A-1), and “Is There a Conflict between 
Science and Human Values?” (Forms D and 
D-1). As a further check on the effects of back- 
ground structuring, Form D-1, while it lacked 
the readings that accompanied Form D, was 
structured with seven provocative questions 
designed to stimulate the student to begin his 
essay. Form A-1 was totally lacking in back- 
ground materials. 

A comparison between the results of Forms 
A and A-1 seems to bear out the hypotheses 
about student performance. The mean grades 
achieved by students who responded to Form A 
are compared in the preceding table with those 
achieved by students who responded to Form 


[ 218 ] 








A-1. It should be noted that the essays were 
graded for each quality on a scale from 1 
(Superior) to 4 (Inadequate); hence a high 
numerical mean indicates a comparatively 
lower level of achievement. 

It will be seen that students responding to 
Form A achieved better scores than those re- 
sponding to Form A-t1 in every case, the differ- 
ences on Reasoning, Organization, and Content 
being such that they would not have occurred 
by chance more than one time in a hundred. 
That these differences in achievement were the 
result of the readings is indicated by the fact 
that the students who responded to A-1 were 
randomly selected from the entire group, and 
by the fact that the readers in no case were 
aware that a given essay had been written in 
response to Form A-1. Many of the essays 
written in response to Form A contained no 
direct reference to the reading materials, and 
the readers were instructed that excellence of 
achievement had nothing to do with a depen- 
dence upon the readings. The fact that Form 
D-1, structured with only seven questions, 
functioned almost as well as Form D—no sig- 
nificant differences emerged—suggests that 
even a small amount of structuring, when the 
essay problem is carefully worded, is sufficient 
to improve the quality of student performance. 

The distribution of scores of students re- 
sponding to Forms D and D-1 showed no 
marked differences. However, the distribution 
of scores of students responding to Forms A 
and A-1 bears out again our hypothesis with 
respect to the effects of background structuring. 
On the qualities of Mechanics and Style no 
significant differences emerged, as might have 
been expected; however, although 5+ per cent 
of the students who responded to Form A re- 
ceived the grade of Superior on Organization, 
3 per cent on Reasoning, and 4+ per cent on 
Content, not a single student responding to 
Form A-1 received a grade of Superior on any of 
these three qualities. On the other hand, more 
students responding to Form A-1 received the 
grade of Inadequate on these qualities than did 


those responding to Form A. It would appear 
that with respect to Organization, Reasoning, 
and Content, students who responded to Form 
A-1 did not perform at their optimal level, and, 
as a consequence, not only did they receive 
grades consistently lower than did students 
responding to Form A, but their grades were 
distributed over the three-point scale from 2 
to 4 rather than over the whole scale from 1 to 4. 

The training of readers and the planning of 
the Reading Conference were based upon a 
number of assumptions that differed somewhat 
from those customary in the past. We assumed 
that systems of counting and weighting errors 
would be likely to create unnecessary disputes 
about the nature of English prose. We provided 
the readers with no previously devised lists of 
errors or aberrations and with no previously 
formulated definitions of qualities. Instead, at 
the beginning of the training period, each reader 
was presented with a notebook entitled “Man- 
ual for Readers,” which contained twenty sheets 
of blank ruled paper; they were to create their 
own manual. Definitions were arrived at by dis- 
cussion, so that readers could feel that they had 
shared in the formulation of these definitions. 
We regarded the reading of the essays as an act 
of free, critical judgment. 

A second assumption was that the student 
who wrote the essay, rather than the essay itself, 
should be judged. In other words, we regarded 
each book as an extensive piece of evidence 
about the candidate’s habitual writing practice. 
Thus the levels of achievement with respect 
to each quality could be defined in operational 
rather than in absolute terms. That is to say, 
we did not try to discover what, in the ultimate 
scheme of things, superior Style for the twelfth 
grader might be; we defined the levels in terms 
of readiness for college work. 

A third assumption was that agreement be- 
tween readers would be improved if they were 
asked to judge each essay separately on five 
qualities rather than to juggle their impressions 
of the essay and come out with a single score. 
We assumed, for example, that a disagreement 


[ 219 ] 








about the quality of Style in an essay book 
could, by this means, be confined to that area, 
and hence not affect possible agreement on the 
other four qualities. 

The plan of reading was designed to test the 
following hypotheses: (1) that readers would 
be best able to judge essays that were compar- 
able—and that the background reading 
materials would elicit more or less comparable 
essays; (2) that readers would be able to attain 
a high degree of consistency if they were asked 
to formulate cooperatively their own definitions 
of qualities and to evaluate those qualities by 
an act of free, critical judgment; and (3) that 
readers would prove able to distinguish be- 
tween the five qualities upon which the essays 
were read to the extent that these qualities 
would emerge as elements which, though re- 
lated to the total essay, would be more or less 
independent of each other. 

During the week of the Reading Conference, 
goo essays were given two independent readings 
by twenty competent readers. No reader ever 
knew whether he was a first or second reader. 
He never knew whether the paper he was read- 
ing was written by a boy or a girl, whether it 
was a response to Form A or A-1, D or D-1, 
written by a student from the East or from the 
West, independent school or public. The results 
of this reading are reported in the following 
table: 


CORRELATIONS OF GRADES ASSIGNED 
BY FIRST AND SECOND READERS 





Number 
Me- Organi- Rea- Con- of 
chanics Style zation soning tent Students 
Form A 97 89 88 88 94 350 
Form A-1 89 .90 65 82 66 100 
Form D 88 86 84 .90 .90 350 
Form D-1 79 .96 93 82 81 100 
Average r, all qualities, all forms (900 cases) 88 
Average r, all qualities, Forms A and D (700 cases) .90 


Average r, all qualities, Forms A—1 and D-1 (200 cases) .85 


The high correlations reported in the above 
table would seem to bear out the hypothesis 
that independent readings based on critical 


judgment rather than on counting of errors 
can result in a high degree of consistency. The 
striking differences between Forms A and A-1 
on Organization and Content seem to bear out 
the hypothesis that totally unstructured tests 
will result in essays more difficult for readers 
to agree upon than will those with some struc- 
turing, at least with respect to certain qualities. 

That readers could distinguish between the 
five qualities is evidenced by the fact that inter- 
correlations between the qualities ranged from 
.45 between Organization and Mechanics to 81 
between Content and Reasoning. Even the cor- 
relation of .81 indicates only 66 per cent com- 
monality between qualities, and that of .45 only 
20 per cent commonality. 


IS THE TEST VALID? 


It has been the purpose here to report only 
the high points of the experiment. Such minor 
considerations as the time taken to read the 
essay books (actually only an average of 27.5 
minutes per book for all readings) must be 
left for a fuller report. It may be said, however, 
that the results of the experiment strongly 
suggest that an improved essay test is feasible 
as a direct measure of the ability to write. The 
present experiment was designed to test the 
success of the examination forms and to test 
whether the essays could be reliably read. 
Future investigation should provide answers to 
such questions as: Will such a test prove to be as 
good a predictor of college success as are present 
instruments? How valid are the part scores in 
diagnosing student writing behavior? To what 
extent is the test valid as a measure of the can- 
didate’s habitual writing practice? 

At present we can say that we have achieved 
some success in measuring those educational 
objectives which relate directly to the ability 
to write a unified essay. That such objectives 
are important to education will scarcely be 
denied. It should be remembered that the inter- 
relationship between test-maker and teacher 
is a complex and subtle one. The test-maker 
constructs tests which will evaluate those ob- 


[ 220 ] 








objectives which the teacher has deemed desir- 
able; he does not himself consciously dictate 
those objectives, though he may ask for a 
clarification of them. However, whether the 
test-maker would have it so or not, his tests 
exert a constant influence on classroom teach- 


ing. If the active process of writing is not evalu- 
ated, writing may cease to be taught. Objectives 
which are never measured will tend to disappear 
from the classroom. If we think the ability to 
write is important, we must evaluate the suc- 
cess with which it is taught. 


A New Venture in the Testing of Motivation 
| John V. Gilmore 


For many years colleges have been attempt- 
ing to predict a student’s academic success by 
the use of such measures of intellectual ability as 
high school grades, rank in class, and scores on 
standard I.Q. tests. In this endeavor the colleges 
have made much progress. For example, the cor- 
relation between the first-term grades and a 
combination of four predictor variables at 
M.L.T. is about .60, which is in one sense a very 
high correlation. However, a great amount of 
error still creeps into any prediction of scholastic 
success which an individual college may use. 

The number of variables affecting academic 
achievement is legion. Obviously, one must take 
into consideration the variation in course con- 
tent, the grading habits of faculty members, the 
differences in teaching methods, and differences 
for some students in the difficulty level of certain 
courses. But even when all of these factors are 
considered, they do not seem to account for the 
difference between the prediction and the actual 
performance of the student. There are obviously 
other variables, including motivation, that have 
thus far proved elusive. 

The course of educational psychology in the 
last twenty-five years has led us finally to an 
attempt to define and measure motivation as 
the most important variable which has not yet 
entered into our statistical predictions. We now 
believe that clinical psychiatry and clinical psy- 
chology are ready to offer a definition of motiva- 


tion that should explain differences between 
achievers and non-achievers which may lead 
ultimately to the validation of some predictive 
instrument. 

Our definition of motivation comes from five 
sources. First, there are articles in the psycho- 
analytic literature which have made some at- 
tempt to explain the relationship of motivation, 
effort, and academic success. Second, there are 
some well-controlled experimental studies in 
child psychology which point to the consequence 
of the early parent-child relationships in the de- 
velopment of personality. Third, there are other 
studies being conducted currently in various 
parts of the country which show the importance 
of the parent-child relationship in the adjust- 
ment of the adolescent group. Fourth, the ex- 
perience of psychiatrist and psychologist indi- 
cates that many of the problems of the student 
body at M.I.T. have their origin in the early 
home relationship. Fifth, a study conducted at 
M.1.T. during the summer of 1951 has led to in- 
teresting measurable differences between a high- 
achieving and a low-achieving group. 

For some time Professor B. A. Thresher, 
Director of Admissions at M.I.T., and members 
of his staff have been interested in the problem 
of increasing the already high predictive index, 
to which we have referred. In a rather small re- 
search project last year with students who were 
in group therapy, the psychologists found that a 


[ 221 ] 








































Sentence Completion Test, a Health Record, 
and a Draw-A-Person Test appeared to have 
some usefulness in predicting failure. These re- 
sults were sufficiently encouraging to warrant 
further investigation. 

The Sentence Completion Test is a projective 
test used by the psychologist to diagnose emo- 
tional reactions. It is a test in which there are 
open-end sentences or, as the name indicates, 
there are suggested words to which the person 
adds other words to complete a sentence. ‘These 
suggested words or phrases can be of different 
length. Theoretically, the shorter the suggested 
word or phrase, the more diagnostre the sentence 
becomes. In other words, the less structured the 
stimuli, the more will the person project his own 
personality on the completed sentence. Since 
World War II the Sentence Completion Test 
has met with a great deal of approval and use in 
the field of clinical psychology. It has the advan- 
tage of being understandable; it uses the usual 
medium of communication, namely, words, and 
the symbolic aspects of it are more readily 
understood by the untrained person than they 
are in the Rorschach or Thematic Apperception 
Test. It seems reasonable to assume that the 
best measure of emotional adjustment should be 
the most direct, namely, a measure of the me- 
dium through which emotion is usually ex- 
pressed—the individual’s language. Therefore, it 
is a practical instrument and a very usable one. 
It appeared from the group therapy project that 
a thorough investigation of the scores of the Sen- 
tence Completion Test might give us some clue 
to this unknown factor that appears to differen- 
tiate the achiever from the non-achiever. 

Early in the spring term of the year 1950-51, 
we had asked twenty students at M.I.T. who 
had the very high cumulative rating of 5.0 or 





John V. Gilmore is Clinical Psychologist at Mas- 
sachusetts Institute of Technology and Associate 
Professor of Psychology at Boston University. 
The Board is contributing to the support of the 


experiment he describes. 


near that to volunteer for testing. All twenty of 
them consented, and we administered the bat- 
tery of tests that is used in the Psychological 
Office. This battery consists of the Wechsler- 
Bellevue as an individual measure of general 
ability, the Cooperative English Test C2 (Read- 
ing Comprehension) as a timed test of reading 
skill, the Ohio State University Psychological 
Test as a power test of academic aptitude, and 
the Kuder Preference Record as a measure of 
vocational interest. With some students the 
Stanford Scientific Aptitude Test was also used. 
To all students we gave, as a means of diagnos- 
ing their emotional adjustment and defense 
mechanisms, the Sentence Completion Test, cer- 
tain cards of the Thematic Apperception Test, 
the Draw-A-Person Test, and a Health Record 
on which the student could check physical com- 
plaints which might be psychosomatic in origin. 


SELECTION OF TEST GROUPS 


It would appear that a very simple, sensible, 
and, at the same time, scientific way to study 
the factor or factors that differentiate students 
who achieve at a high level from those whose 
scholastic rating is low would be to study care- 
fully two groups which are already differentiated 
by an accepted criterion, such as professors’ 
marks. We chose fifteen of the twenty high- 
achieving students who had taken the battery 
described above and compared them with fifteen 
of the students who either had become disquali- 
fied or were on the margin scholastically. The 
students in this low-achieving group had either 
been referred to the psychologist or had come of 
their own volition. Differences between these 
two groups on the battery used to diagnose emo- 
tional adjustment and defense mechanisms en- 
couraged further research. We then increased 
the number in each of the two groups to thirty- 
five; we did this by lowering the cumulative rec- 
ord requirement for the high-achieving group to 
4.0, which includes students on the first or sec- 
ond Dean’s List. In the low-achieving group we 
included students whose cumulative record was 
2.65 or below. In comparing these larger groups 


[ 222 ] 

















we found differences in the means on the diag- 
nostic battery. It should be said here that differ- 
ences had been found between the low and high- 
achieving groups in their scores on the tests of 
intelligence and achievement and in the Predic- 
tive Rating compiled by the Admissions Office. 
The purpose of further experiment would be to 
discover whether the differences found between 
the groups on the projective tests would add to 
rather than duplicate the differences found by 
the methods usually employed. 

We then turned to the Sentence Completion 
Test to find on which items differences were ap- 
parent. When the thirty-five students in each 
group had been chosen, we copied the answer for 
each of 130 questions for all seventy students. 
The answers were placed side by side and 
studied. On some items we found definite quali- 
tative differences in their answers. On other 
items the responses were quite similar. 

Although answers tend to overlap, the follow- 
ing responses to a few are more representative of 
one group than of the other: 

The word father was given in the middle of a 
sentence and the student was asked to write 
words before and after it, making a good sen- 
tence. The superior group was characterized by 
these kinds of answers: “I admire my father very 
much for his quiet understanding and friendly 
way.” “I would like to earn enough money so 
that my father could take it easy for a while.” 
“My father is a good example.” “I can at least 
see my father as a human being, disappointed, 
frustrated, hopeful as all humans are.” “My 
father was one of the most brilliant men I have 
ever met.” “My father is a great guy.” “I like my 
father because he always trusts me.” The in- 
ferior group made these kinds of statements: “I 
don’t think a father will approve.” “I often won- 
der what my father thinks of his life.” “A man 
can learn from his father as well as he can learn 
from his mistakes.” “I hope that someday my 
father will tell me that I am doing something 
right.” “I wish my father would give me a break.” 
“T would like to be a lot closer to my father than 
I am today.” There are apparent differences be- 


tween the groups in the quality of the answers. 

Another item on which we received dif- 
ferent qualities of answers was the incomplete 
sentence, He is dependent upon. The superior 
students gave answers like these: “He is depend- 
ent upon personal success for meaning in life”; 
“his job”; “the whims of his boss”; “her bank 
account”; “her sympathy”; “his parents for sup- 
port”; “his own resources”; “himself and not 
other people’s help.” The inferior group then 
gave answers like these: “He is dependent upon 
other people”; “his emotions to over great ex- 
tent”; “his wife as he was dependent upon his 
mother”; “liquor for relaxation”; “his parents, 
even at his age of twenty-five”; “his father and 
mother for financial, mental, and spiritual help.” 

Another item that differentiated the two 
groups was J would like to be. For the superior 
group we have such answers as: “I would like to 
be a really good mathematician”; “a physicist 
or mathematician”; “a successful architect”; “a 
research chemist”; “to be successful as an engi- 
neer”; “able to have a 4.00 average when the 
term ends.” For the low-achieving group we 
have such answers as these: “I would like to be 
married”; “happy in life”; “able to earn a good 
living”; “to be a ‘man’”; “be certain that I am 
in the right school”; “more perfect than I am.” 


A NON-DIFFERENTIATING ITEM 


One item that we thought would differentiate 
the groups, but that did not, was the one, When 
things become difficult. For the high-achieving 
group the answers are like these: “When things 
become difficult, I force myself to relax and let 
it come”; “I give up”; “I turn to philosophy”; 
“T will become more difficulter(sic)”; “I work 
harder”; “I try to adjust.” For the inferior group 
such answers as these: “When things become 
difficult, I go on to something easier”; “I fight 
harder”; “one should fall back on the so-called 
scientific method to pull me out of it”; “sit down 
and think it out”; “I try doubly hard to solve 
them.” One can detect an element of similarity 
in these answers. 


We then devised a scoring system by which 


[ 223 ] 








we weighted the items on a scale from plus two 
to minus two. The items that did not produce a 
statistically significant difference were elimi- 
nated. Only about thirty-seven of the items 
differentiated the two groups; the other ninety- 
three did not. 

An analysis of the differences on the Sentence 
Completion Test indicated that the achieving 
student is characterized by a much happier re- 
lationship with his father, a closer identification 
with his mother, and a marked quality of inde- 
pendence. He is quite active; he possesses a sense 
of direction and farsightedness. He is more in- 
terested in others than himself, more mature, 
more concrete, more definite, and more positive. 
The non-achieving student, on the other hand, is 
characterized by things which are the antithesis 
of these. The non-achiever has rather poor rela- 
tions with both his father and his mother. He 
is characterized by behavior that is dependent 
and passive. He is more interested in himself 
than in others. He is anxious, less mature, often 
exhibiting infantile characteristics. His answers 
are vague and abstract, and they tend to have a 
negative quality. 

Before attempting to interpret the behavior 
we see exhibited on the Sentence Completion 
Test, let us turn to current dynamic psychology 
for a theoretical explanation. In psychology to- 
day, behavior is explained in terms of the de- 
velopment of defense mechanisms, that is, all 
life is predicted on avoiding anxiety or avoiding 
threats to our general well-being. The present- 
day psychologist thinks that within a few days 
after birth and certainly within a few weeks of 
age the child learns that there are certain actions 
which bring him security and certain actions 
that threaten him. Because a child is so helpless, 
his well-being and his very life depend upon his 
acceptability. In order to live, he must get the 
approval of his parents or at least must get their 
attention. Consequently, the child acquires a 
sense of values that is prevalent in the home in 
order that he may be less threatened. As long 
as he does the things that are approved, or as 
long as he is accepted, he has a feeling of safety. 


Hence, the greater the insecurity, the greater the 
anxiety, and thus the greater the dependence he 
has upon his parents because of a fear that they 
may leave him or reject him. The way then for 
him to avoid anxiety is to be accepted and loved 
by his parents. Being secure and being certain 
of acceptability by his parents or parental sub- 
stitutes enables the child to withstand many 
threatening situations in life which may occur 
outside the family group. 


HIGH GRADES AND HAPPY HOMES 
In a happy home, where the child is accepted 


and recognized, where there is praise and ab- 
sence of verbal criticism, the child should feel 
less need to be-on the defensive and should de- 
velop into a more mature and independent 
adult. We have known in psychology for many 
years that praise is the greatest motivating 
factor in behavior. It has taken psychoanalytic 
psychology to point to the importance of this 
approval and praise in the parent-child relation- 
ship. The withdrawal of affection and approval, 
therefore, creates anxiety, and this withdrawal 
is something the child dreads. He learns to do 
the things and consequently takes on behavior 
characteristics that will protect him and defend 
him against this anxiety-producing situation. If 
a person threatens him, he becomes anxious, and 
there is no other choice but to develop hostile 
feelings or defensive reactions toward that per- 
son. A person’s anxiety, then, is directly related 
to the lack of acceptance by his parents or pa- 
rental substitutes. By association, the child 
learns to identify all other people with his 
parents and adopts a similar reaction of anxiety 
to them. The absence of anxiety suggests ac- 
ceptance and thus less need to be fearful of 
inter-personal relationships. 

In the light of our preliminary research, our 
theoretical explanation for the high grades of the 
achieving student is as follows: He tends to come 
from a happy home; therefore, he is more ac- 
cepted. This condition motivates achievement 
because receiving praise for many things sug- 
gests that he will be praised for high academic 


[ 224 ] 














bp @ 





attainment. Scholastic achievement, then, con- 
stitutes an area of safety for him and hence one 
of freedom from anxiety. Another reason for his 
achievement is that a person coming from a 
happy home is free to work. He is less concerned 
with himself because he is less on the defensive. 
He does not need to spend the time thinking 
about a solution to his problems because being 
already accepted, his anxiety and hence his 
problems are greatly reduced. He is free to de- 
vote his energies to work. A third reason for the 
happier person’s achievement is his independ- 
ence. The approval or disapproval of the group 
as far as his academic achievement is concerned 
does not threaten him to the extent that it does 
the low-achieving student. To some students 
high achievement would put them apart from 
the group, and this they cannot stand. The 
happy, secure person at home does not need to 
worry about the rejection of the group if he 
makes a high record. Another explanation for 
greater achievement is that the happy person is 
psychologically a receptive person. He is there- 
fore better able to receive what occurs in class 
because he should have fewer learning blocks 
and has little need to protect himself against 
new experiences. Not only is he more receptive, 
but he is also free to be more expressive and 
hence more productive on examinations and in 
class work generally. A fifth explanation as to 
why the superior student makes a higher score 
on some tests, such as the Scholastic Aptitude 
Test, concerns his vocational goal and far- 
sightedness: he has learned to accumulate skills 
that contribute to the end that he wishes. 


ACHIEVEMENT AND ADJUSTMENT 

The achieving student does not in every case 
represent the ideally adjusted person. Some of 
those in our group have consulted a psychiatrist. 
This does not mean that they possess more emo- 
tional problems than the non-achieving group 
which also sees the psychiatrist; going to a psy- 
chiatrist may be an indication of adjustment, 
since the more seriously disturbed person often 
avoids a psychiatrist’s help. As a group they 


have emotional adjustments to make, but to 
them high academic achievement is an area that 
is free from anxiety, or is an area that at least 
offers relative freedom from anxiety. 

The causes for the low-achieving student’s 
record can be explained thus: First, being much 
on the defensive and a rigid person, he is not free 
to work easily. He is not as receptive, and hence 
he cannot take in the material as well. More- 
over, if he has taken it in, he cannot give it out 
because of a tendency to hold on to everything 
as a means of defense. Intellectual activity re- 
quires that there be a willingness to express ideas 
and to answer questions. The anxious student 
probably has a certain amount of anxiety about 
giving, which greatly reduces his effectiveness as 
a student. Another reason why the inferior stu- 
dent does not achieve is because he is a very de- 
pendent person. Achieving places him outside 
the group—a situation he cannot stand. Being 
apart from the group is more of a threat than are 
failing grades: he is more of a target. Thus it is 
less threatening to fail than it is to achieve. He 
is a passive, dependent person, and achieving 
makes him an independent person, which would 
create a certain amount of anxiety because he is 
not emotionally equipped to handle his inde- 
pendence. Another explanation is that the pas- 
sive, dependent person is usually very much on 
the defensive and usually a person with a more 
than average amount of concern or guilt about 
his hostility and aggressiveness. Intellectual 
achievement involves competition, which re- 
quires a certain amount of aggressiveness. If the 
person has undue fears about his aggressive ten- 
dencies, it may bring about guilt, and this guilt 
may be displaced on intellectual achievement. 
Guilt from any cause is apt to result in the “not 
what I can do but what I deserve” type of re- 
action. It is easier for him to be passive because 
being an active person means being aggressive, 
which may arouse hostile feelings in someone 
and result in a certain amount of rejection. 

Criticism, authoritarian directives, and un- 
pleasantness generally constitute forms of re- 
jection which arouse anxiety. If the child is 


[ 225 ] 





placed on the defensive concerning his school 
work or educational program, he will tend to 
avoid that aspect of his environment. It is axio- 
matic in psychology that a person does not 
achieve as well as he should in an area where 
there is anxiety. 

According to our rationale the motive to 
achieve scholastically is directly associated with 
the quality of relationships the child has with 
his parents and with their attitude toward learn- 
ing. Our hypothesis is that motivation for aca- 
demic achievement is associated with a positive 
relationship with one or both parents. 

The most valid approach to the study of mo- 
tivation as stated in our hypothesis would be to 
measure as directly as possible the quality of the 
parent-child relationship. This is not accom- 
plished easily. To discuss honestly and frankly 
the authentic reaction to one’s parents is difficult 
for most of us. This is particularly true if there 
are deeply repressed aggressive reactions toward 
one or both parents. In our study we are at- 
tempting to measure these reactions with the 
use of the Sentence Completion Test, a Check 
List, and a Health Record. 


CHOOSING THE CHECK LIST 


As was stated above, we revised the Sentence 
Completion Test, using a total of forty items. 
We also constructed a Check List of eighty-six 
items of a multiple forced-choice nature. Forty 
of these items were identical to the ones used in 
the Sentence Completion Test. The forced- 
choice selections were taken, with a few excep- 
tions, from the actual responses to items on the 
Sentence Completion Test in the previously 
mentioned study. The other forty items on the 
scale attempted to give a rather wide coverage 
of the relationship between the child and the 
parent in certain strategic and pertinent areas 
in the early home life, such as the parent’s at- 
titude toward discipline, school work, and sex 
education. We wanted an instrument which 
could be scored objectively, and one which 
would cover different kinds of parent-child re- 
lationships. 


The rationale of the Health Record, which we 
also prepared, is that, from a theoretical point 
of view, the greater the security and inde- 
pendence, the less physical complaint. On the 
preliminary study we found some differences 
that led us to believe that the high-achieving 
student is apt to have difficulties along the di- 
gestive tract, whereas the low-achieving stu- 
dent tends to have difficulties in the other areas 
of the body, such as psychological heart condi- 
tions and asthma. There were certain other 
physical conditions associated with low-achiev- 
ing students which suggest lack of maturity and 
dependence. 


ADMINISTERED TO 1200 STUDENTS 


These three tests were administered in Sep- 
tember, 1951, to more than 1200 students, about 
750 of them at M.I.T. and almost 500 of them at 
Wellesley College. The results will be compared 
with previously established predictive statistical 
procedures, such as the Predictive Rating at 
M.I.T. and the Scholastic Aptitude Test at 
Wellesley, and will be studied on the basis of 
the rationale that has been used for the scoring 
of the preliminary study. If this rationale does 
not yield us encouraging results, we shall re- 
score the data for differences in the quality of 
responses of this new population. 

This is a cooperative study. Many people 
have helped in one way or another to get the 
project under way this summer: Professor B. A. 
Thresher and other members of the Office of 
Admissions, Dr. D. L. Farnsworth, Medical Di- 
rector, psychiatrists, psychologists, Miss Susan 
Gerschenkron, Psychometrist, Miss Mary Hen- 
nessey, Statistician, deans, and other adminis- 
trative personnel at M.I.T.; Miss Mary Chase, 
Executive Vice-President of Wellesley College, 
and Professor Heidbreder of the Psychology De- 
partment at Wellesley; and Mr. William Fels 
and Mr. Frank Bowles of the College Entrance 
Examination Board. In its present stages the 
experiment has certain limitations, but it is 
a promising approach to a very important 
problem. 


[ 226 ] 








p 





Science Teaching and the Board’s Science Tests 
Paul F. Brandwein 


If by some improbable chance you and I were 
given the power to dictate the course of edu- 
cation, what would we do about education in 
the sciences? 

If only we could call upon a formula to help 
us, such as E = mc?, where E = education, 
m = materials of instruction, and c = a con- 
stant based on speed of learning, our burden 
would be considerably lightened. But science 
education, like all education, is not a science; it 
has only just begun to travel the rigorous and 
tedious road toward validity. There is cause for 
satisfaction as we see places where magic, super- 
stition, and authority have been replaced by 
free-ranging speculation based on experience. 
Here and there we see experience giving way to 
experiment, which in a sense is experience in 
search of meaning. Still, we cannot call upon a 
formula. ..or can we call upon a body of in- 
vestigation whose validity is irreproachable. 

Can we call upon the wisdom of experienced 
science educators to help us fashion the cur- 
riculum we desire? Of course, but here, too, we 
gain little comfort; there appears to be the 
widest disagreement for there is the widest range 
of experience. Nevertheless, the observer who 
has an operational approach, that is, who looks 
for what is being done rather than for what is 
being said, and who, in addition, looks not for 
agreement but for direction, will find unmis- 
takable signs of the state of affairs in science 
education and, indeed, in education generally. 

There was a time when most young men and 


Paul F. Brandwein is Chairman of Science at 
Forest Hills High School in New York City and 
a member of the Department of Natural Sci- 
ences of Teachers College, Columbia Univer- 
sity. He is also Science Editor of Harcourt, Brace 


and Company. 


women who went to secondary school also went 
to college. It seemed ciear then that the sec- 
ondary school’s curriculum should be introduc- 
tory to the college curriculum. In objectives, 
curriculum, and method, the college was the 
template of the high school. Thus, success in col- 
lege could often be extrapolated from success in 
high school. It was reasonable to expect that the 
efforts of the College Entrance Board in those 
days would be expended in developing tests to 
measure success in the subject-matter fields. 


RIFTS IN THE SCHOOL CURRICULUM 


A search into the curriculum of the then col- 
lege-centered high school, if I may use a more 
appropriate adjective than subject-centered, 
shows rifts in what would appear on the surface 
to be a placid curriculum. First, there was the 
American dream of a free education for every- 
one; more boys and girls were going to high 
school, and some of these, for one reason or 
another, were not going to college. Second, as 
these students with different needs and interests 
began to make inroads into the high school 
population, there was competition for curricular 
time—new courses began to appear as the es- 
tablished ones began to lose ground. We may all 
recall the fate of Greek, Botany, Ancient and 
Medieval History, and other courses. 

If we follow one transmutation in subject mat- 
ter we can sense the direction of the educational 
wind. What befell Botany and Zoology? They 
became General Biology. An examination of the 
curriculum of this early General Biology, how- 
ever, shows that it was one half-year of Botany 
and one half-year of Zoology; only the course 
title had changed. As teachers asked themselves, 
“To what end?” and began to change method as 
well as content, the course began to approach 
the realities of educational life, that is, it began 
to meet the needs and interests of most of the 


[ 227 ] 





students in the schools. To make a long story of 
educational development short, General Bi- 
ology went the road from a course which sought 
its mainspring in college preparatory courses 
(and hence was introductory) to one whose 
mainspring was in faculty psychology (and 
hence was valuable for its mental discipline) 
to one which at present seeks to meet the needs 
and interests of the students (and hence has 
its mainspring in the student himself). 

Even now we shall find all three courses exist- 
ing, but the last enrolls a great number of stu- 
dents in the public high schools. This last course 
does not concern itself mainly with the Biology 
of the frog or the garden pea, but of the human 
being. Its problems are not primarily osmosis, 
nor the classification of various invertebrates, 
nor meiosis, nor the fate of the dorsal arch, nor 
even the cell itself. Its problems are those the 
students will meet in life, among them, those of 
nutrition, conservation of natural resources, and 
individual health. Osmotic pressures or the anat- 
omy of the crayfish or any other subject matter 
is useful when it helps solve these problems. The 
modern course in General Biology looks pri- 
marily not to the development of subject-mat- 
ter specialists but to the development of citi- 
zens whose need is to know the kind of Biology 
which will help them live better lives. 


SCIENCE COURSES AND SOCIETY 

Note, too, if you please, that Physics and 
Chemistry are approaching the curricular focus 
of General Biology. Here and there over the 
country, courses in Physical Science, embracing 
Physics and Chemistry, are springing up. Many 
of them are just changes in course title. Others 
are developing as sound general courses in Phys- 
ical Science, designed to meet the needs of 
youngsters who will live in a world changed by 
the physicist and chemist and who will need to 
understand and profit from these changes as 
they occur. 

Teachers are coming closer and closer to ac- 
cepting as the aim of the high school the personal 
and social development of young people. They 


realize that there is all the difference in the 
world between teaching subject matter as an 
end in itself and teaching it to solve problems 
of living. The criterion for selecting subject mat- 
ter is becoming: Will this course help young 
people to meet their developmental tasks, that 
is, the tasks they must complete to become suc- 
cessful members of a society growing ever more 
complex? 

While courses such as Auto Driving are at 
present on the periphery of the central objec- 
tives of the school, they gain their sanction from 
the needs and interests of teen-age drivers whose 
death toll is increasing. Nevertheless, the proc- 
ess of general education, of which a course in 
Auto Driving is the symptom and not necessar- 
ily a central objective, has been at work at the 
core of the high school curriculum. 


EFFECT ON SCHOLARSHIP 


We need have no anxiety that standards of 
scholarship or other aims of intellectual aspira- 
tion will suffer. Children who have not had 
Greek but have had modern languages, who 
have not had the traditional courses in history 
but have had Social Studies or Problems of Civ- 
ilization, who have not had Botany, Zoology, 
Physics, and Chemistry but have had General 
Science, Biological Science, and Physical Sci- 
ence have shown themselves able to profit from 
college and community life and to contribute to 
both. The lessons of the Eight Year Study, crude 
as they were, need not be lost on us. Indeed, in 
my own “experientation,” to use an awkward 
although more appropriate substitute for the 
word experimentation as used in education, I 
have found that youngsters at the Forest Hills 
High School who took a four-year course in sci- 
ence without regard to subject-matter areas but 
who concerned themselves with major problems 
(e.g., atomic energy, the conquest of disease) 
not only did well in college, but gained more 
honors in science in high school than any com- 
parable group before or since. 

A number of high schools have now intro- 
duced the core programs. In obeisance to college 


[ 228 ] 














»D @ 


requirements, these schools may record on their 
students’ transcripts a year of English, a year of 
General Science, a year of Community Civics, 
and a year of General Mathematics, but the 
youngsters who have been in these core pro- 
grams may have had a year of work under the 
general title, “Getting to Know Yourself and 
Your Community.” 

Some teachers are even rash enough to pre- 
dict the day when a college admissions officer 
will accept with equanimity a high school pro- 
gram reading: Core 1, Knowing Yourself; Core 
2, Language Tools (English and Foreign Lan- 
guage); Core 3, Problems of Democracy (Social 
Studies); Core 4, Health; Core 5, Science and 
Its Methods (Science for four years; Core 6, 
Mathematical Tools (Mathematics, Simple 
Bookkeeping, etc.); or just Core I, Core II, 
Core III, Core IV; and will leave to college en- 
trance examinations the task of determining the 
fitness of the candidate to profit from college 
work. 

The direction of educational movement in the 
public high schools is unmistakable. Its velocity 
is increasing. It may even be false to call it a di- 
rection because it is so thoroughly established. 
The educational vector, the product of the di- 
rection and the force behind it, is the process of 
general education we have been discussing. 

Although general education is difficult to de- 
fine, it can be clearly recognized if one looks at 
the process operationally. The process starts 
with the question, “Why are we giving this 
course?,” continues with an investigation of the 
needs and interests of the students in terms of 
problems of living rather than in terms of sub- 
ject matter, leads into experientation with con- 
sequent evaluation in terms of the objectives of 
the course, and develops into a dynamic rela- 
tionship between teacher and student wherein 
both are engaged in getting significant informa- 
tion and experience and in using the information 
and experience in the solution of meaningful 
problems of living. The process never ends; it is 
its Own motivation. 

Broadly speaking, the process which is 


general education is following somewhat the 
same path in the colleges that it has followed 
in the high schools. Our operationally-minded 
observer will note some of the signs of an emerg- 
ing concept of general education. There is a 
rejection of the notion of faculty psychology; 
there is an increasing acceptance of the belief 
that courses, at least in the first two years, 
should not be introductory but should be com- 
plete on their own level; there is an increasing 
acceptance of the fact that most students who 
take a course in the first few years do not neces- 
sarily intend to specialize in that area; there 
is an awareness that the students’ own needs 
and interests may furnish useful objectives for 
the course; there is, in short, a clearly percep- 
tible shift from emphasis on subject matter for 
its own sake to selection of subject matter for 
the students’ sake. 

The difficulties in the way of retooling are 
also perceptible. There are changes in course 
titles. There are changes in content. There are, 
in some places, changes in method. The major 
problem seems to be in recruitment of teachers 
for the very desirable programs which have 
been planned. 


COLLEGE ENTRANCE PINCER 


Be that as it may, there is a pincer movement 
in education which will, it seems, envelop the 
procedures involved in college entrance, in- 
cluding subject-matter requirements. One end 
of the pincer is general education, well estab- 
lished and flourishing in the public high schools: 
high school students are graduating without 
having taken the courses once favored for col- 
lege entrance. Yet these students seek admission 
to the colleges. The other end of the pincer is 
general education as it is being interpreted in 
colleges throughout the country. Looking at it 
from the position of one with some experience 
in both high school and college, it would seem 
to me that the movement in the colleges is as 
yet small, but it is formidable for it not only 
has behind it the force generated by the in- 
creasing demands of many more students for a 


[ 229 ] 








Sample Item—Without Subject Matter Loading 


Interpretation and Analysis of Science Reading Materials* 


Rivers may be classified as young, mature, or 
old. A great deal of material is deposited at the 
mouth of the river, such as pebbles, boulders, 
silt, and clay. In a young river, water flows 
rapidly (sometimes with great speed). The 
river may carry large boulders or fair-sized peb- 
bles and, as a result, the river bed is cut deeply. 


In an old river the water flows slowly and 
placidly. Fine gravel, silt, and other fine particles 
like clay are carried and deposited wherever the 
slope levels off. 


If for some reason (e.g., geologic upheavals) the 
slope of an old river is raised, the water again 
begins to flow with great speed. Again large 
pebbles and even boulders may be carried. 
Again the river bed is cut deeply. 


1. According to the passage above, the cross- 
section of an old river whose slope has not 
been raised should be shaped like one of the 
following: 

(1) (2) (3) (4) (5) 


VWOLIWwJU 


college education, but it is also firmly rooted 
in sound educational psychology. It will grow. 

Where college faculties have accepted general 
education as a way of meeting the needs of their 
students, subject-matter testing, in its narrow 
sense, and subject-matter entrance require- 
ments are losing ground. This follows since en- 
trance procedures based on subject matter per 
se were established to predict the success of 
students in work they had begun in high school. 
It seems, therefore, that if this pincer movement 
is successful, subject-matter tests will become 
progressively less important and aptitude tests 
will become more important. But if the nature 
of the change itself is any indication of the way 


[ 230 | 


2. At the mouth of a river that has progressed 
from youth to old age, one should find layers 
of material consisting of boulders, pebbles, 
gravel, and silt in a definable order, from the 
bottom up. Which one of the following is the 
correct order? 

(1) boulders, pebbles, gravel, silt. 
(2) gravel, pebbles, silt, boulders. 
(3) silt, gravel, pebbles, boulders. 
(4) pebbles, silt, boulders, gravel. 
(5) silt, pebbles, gravel, boulders. 


3. If a layer of boulders is found deposited at 
the mouth of what was once an old river bed, 
it may be assumed that: 

(1) the slope of the old river was raised. 

(2) the slope of the old river was lowered. 

(3) for some reason the old slow river car- 
ried boulders along its entire length. 

(4) the old river meandered. 

(5) the old river became an ox-bow lake. 


* Space does not permit publication of other item 
types, without subject-matter loading, in Under- 
standing and Analysis of Methods of Science, in 
Application of Principles of Science, and in Analysis 
of Observations and Experimental Data. 


things will go, that is, if evolution and not 
revolution apply to testing procedures, then we 
will probably not see subject-matter tests give 
way at once to aptitude tests. It is likely that 
subject-matter tests will evolve into tests ot 
“developed aptitudes.” There will be a decreas- 
ing tendency to measure details retained through 
experience and an increasing emphasis on the 
measurement of “developed aptitude” to use 
knowledge gained to solve significant problems. 


BOARD IN STRATEGIC POSITION 
In this shift of the educational spectrum, the 
College Board is in a strategically important 
position. Such a position brings with it attend- 











») ®@ 


ant problems. Should the Board face the past 
and support the more or less clearly defined 
position of traditional college entrance require- 
ments and subject-matter testing? Should it 
adjust itself to support the development of 
programs of general education, which by the 
very nature of their genesis must differ with 
different educational environments? Or should 
it engage in a search for relative stability and 
look to the development of tests for aptitudes 
and developed aptitudes? 

In an attempt to plot a position in a period 
of educational flux, the College Board has 
looked for a clearer definition of its objectives 
and of ways to fulfill them. It appears clear 
that the Board fulfills one of it major functions 
when it makes available to its consumers pro- 
gressively more valid and reliable instruments 
for the selection, guidance, and placement of 
candidates for college entrance. It will prob- 
ably meet the needs of its consumers best when 
its testing devices are submitted to constant 
experimentation in the field. And this is one of 
the things the Board has chosen to do. 


DOING ONE’S DAMNDEST 


This preliminary statement has been ex- 
tended because, in-a real sense, it is the raison 
d’etre of the Science Committee. Science by its 
very nature lends itself to the process of general 
education whether it is defined in terms of 
singleness of purpose, i.e., the quest for reality, 
or of singleness of method, i.e., the method of the 
verified hypothesis, or, broadly, as “doing one’s 
damndest with one’s brain, no holds barred.” 
There is still the well-founded view that minor 
differences aside, all the sciences have the same 
pervasive aim. The observer who searches the 
journals of education, who reads prefaces to 
courses of study, who sits patiently at meetings 
of scientists and science teachers must conclude 
that there is the widest agreement on the per- 
vasive aims of science teaching. 

He will find the widest agreement, even an 
unyielding position, on the belief that science 
teaching, whatever the field, whatever the cur- 


riculum, whatever the method, should produce 
students with a literacy in science. This literacy 
is generally shown by the ability to apply the 
methods of science to a problem at hand, to read 
and analyze scientific material, and to use the 
major concepts of science as they apply to 
modern living. This fits our definition of a 
developed aptitude. 


AGREEMENT ON WHAT? 


Now those of us here, all tough-minded, will 
be disenchanted with such easily won agree- 
ment. We will erode the agreement with such 
valid questions as, “What are the methods of 
science?”; “What are its major concepts?”; 
“What is success in dealing with scientific 
literature?” And there will be many, many other 
questions, all of them valid, all of them neces- 
sary, all of them difficult to answer in our 
present state of ignorance. Nevertheless, there 
is the widest agreement that the pervasive aims, 
the hard seed of science teaching, deal not with 
subject matter per se, but with an understand- 
ing of its methods and an ability to apply these 
methods. 

In addition, the conviction has grown that 
one type of competence (or developed aptitude) 
colleges have a right to expect of candidates 
for admission is the ability to apply scientific 
methods to a problem at hand and to interpret 
and use scientific information. This is clear 
as one examines the literature. And there is also 
the conviction that this developed aptitude can 
be and should be measured. 

This last conviction is, of course, not new. 
It has been one of the goals of those who work 
at the demanding task of inventing test items. 
The inventor of test items is acquainted with 
the many efforts in the field. And indeed, the 
College Board has dealt with it in its various 
reports. Almost fifteen years ago a committee 
of the Board concerned itself with the problems 
of constructing tests in combined sciences. 

In March, 1951, a new Committee on Science 
Testing was appointed by the Board. The work 
which I will here report is based on the imagina- 


[ 231 ] 








tion, resourcefulness, and good sense of my 
colleagues on this committee: 


Dr. Finla G. Crawford, Syracuse University, 
Chairman 

Dr. Alexander Efron, Stuyvesant High School, 
New York, N. Y. 

Dr. W. Joe Frierson, Agnes Scott College 

Professor Philippe E. Le Corbeiller, Harvard Uni- 
versity 

Dr. Morris Meister, Bronx High School of Science, 
New York, N. Y. 

Dr. Leo Nedelsky, University of Chicago 

Professor Eric M. Rogers, Princeton University 


Professor Richard Sutton, Haverford College 


We are also much indebted to, Warren Find- 
ley, John Dobbin, and Bernard Cayne of the 


Educational Testing Service. 


THE COMMITTEE’S GOAL 
The charge before the committee was clear: 


t. To survey the present science achievement 

tests from the viewpoint of both schools and 

colleges; 

To consider ways and means of attaining closer 

articulation of objectives in science teaching 

in secondary schools and colleges; and 

. To discuss the practicability of a conference 
at which science teachers representing schools 
and colleges could convene to define purposes 
and discuss methods of teaching science. 


te 


wn 


From the beginning, the committee took the 
position that an operational approach to the 
problem was desirable. The committee thought 
that some of the objectives implied in the first 
charge (to survey the present science achieve- 
ment tests) might be attained by constructing 
and evaluating an entirely new science test and 
using the experience as a basis for making use- 
ful recommendations. The “operation” agreed 
upon was the invention of a single test designed 
(1) to measure the level and scope of the can- 
didate’s scientific knowledge and his ability to 
apply this knowledge to solve problems, and (2) 
to inquire into the candidate’s literacy in 
science as shown by his ability to analyze sci- 
entific literature, to analyze the principles of 
science, and to use scientific methods. 

The rationale for the proposed test was based 


[ 232 ] 





on the conviction that people living in a culture 
in which science plays such a prominent role 
should “feel comfortably at home” in scientific 
media. They should have a certain amount of 
scientific literacy and a certain amount of opera- 
tive content in science, operative content being 
generously defined as the amount and kind of 
science which can be recalled with relative ease. 
Candidates intending to pursue science in col- 
lege or to engage in a profession based upon sci- 
ence may be expected to have a considerably 
greater operative content in science. 

The proposed test was to be designed primar- 
ily to measure the candidate’s ability to under- 
stand and use science, regardless of how the 
competence had been developed. It was in- 
tended to be used, together with other predic- 
tive measures, as a vehicle for predicting com- 
petence to engage in college work. The test was 
to measure the ability of the candidate to use 
science in problem situations. 

When the committee first pondered its charge, 
it was thought that it would be well to elaborate 
a test which would cut across all subject matter 
areas; the test was to draw upon science per se, 
rather than to be confined to familiar arbitrary 
categories. 

It was felt that whether or not a useful test 
was finally developed, the test items, or testing 
instrument, could serve several purposes: (1) to 
evolve specific recommendations in the field of 
science testing; (2) to illustrate in a practical way 
to the existing science examining committees 
the thinking of the Committee on Science Test- 
ing; (3) to affect the construction of the existing 
science tests (indeed this might be the most 
important function of the work which is the 
subject of this report); (4) to test the temper 
and attitude of consumers of CEEB tests towards 
science testing of the type proposed. 


CRITERIA OF EVALUATION 


The test, like all inventions, would be evalu- 
ated in terms of accepted criteria. Three criteria 
would be considered in selecting each item for 
the proposed test: first, the mental process 














0) | @ 


Sample Items—With Subject Matter Loading 


Chemistry 


To study the effect of heat on oxides, lead di- 
oxide was heated and oxygen was liberated. 
Manganese dioxide liberated oxygen when 
heated. On the basis of the above statements 
which one of the following conclusions is 
justified? 


(1) In the absence of heat, liberation of oxygen 
from an oxide is impossible. 

(2) The combination of any metal with oxygen 
is unstable in the presence of heat. 

(3) All oxides liberate oxygen when heated. 

(4) Some oxides may be decomposed by physi- 
cal methods as well as by chemical reagents. 

(5) Lead and manganese are active metals be- 
cause their oxides decompose readily on 
heating. 

Biology 


Mr. and Mrs. Y have twins; Mr. Y is blood type 
“A” and Mrs. Y is blood type “O.” Both twins 
will be: 

(1) “A” types. 


which the item proposes to examine; second, 
the scientific concept or priniciple involved; 
and third, the amount and kind of recall neces- 
sary. The difficulty of the item was to depend 
upon a combination of the recall required, the 
mental process involved, and, to a lesser extent, 
the concept involved. 

The members of the Committee on Science 
Testing submitted some 150 test items. Of 
these, 97 were considered promising. After 
being edited and graded in what appeared to be 
an ascending order of difficulty these 97 items 
were administered to several groups of students 
and teachers. 

On the basis of these crude experimental ex- 
cursions, the following notions could be tenta- 
tively stated. There was general agreement by 
the students tested, and by the teachers, that 
this was not alone or even principally a test of 


(2) “B” types. 
(3) “A” or “O” types. 
(4) “AB” types. 


(5) Types unknown because of insufficient data. 


Physics 


Questions 1-4 involve the following five curves: 


(5) (4) 





For each of the following relationships select the 

most appropriate curve: 

(1) Pressure in a liquid varies directly with the 
depth. 

(2) The voltage furnished by a group of fresh 
dry cells is constant. 

(3) Heat in a given resistance varies with the 
square of the current. 

(4) The pressure of a gas varies directly with 
the absolute temperature (at constant vol- 
ume). 


science content, but that in order of emphasis 
the items stressed scientific thinking and 
understanding of science method, understand- 
ing of principles of science, and recall of content. 

It appeared, however, that certain items 
carried with them the climate of their intrinsic 
content and required a type of recall (latent, 
if you will) which limited a student’s willing- 
ness to involve himself in them. Students were 
prone to say, “That’s Physics, and I have never 
had it,” or “ That’s Biology,” etc. This in spite 
of the fact that there was universal agreement 
among the candidates that this was a “new” 
type of test—one that stressed scientific think- 
ing rather than recall. The problem of the ex- 
tent of subject-loading in a general test is still 
with us and will require all the ingenuity which 
can be mustered. 

As preliminary investigations, elaboration of 


[ 233 ] 








items, discussions with teachers, study of science 
curricula, identification of current science se- 
quences, and various other kinds of cerebral- 
searching procedures progressed, it became in- 
creasingly clear that it would also be well to 
consider several factors. 

Students who take two years of science 
generally take General Science and one special- 
ized science. The latter is usually Biology; how- 
ever, depending on the school, Chemistry or 
Physics may also be taken as the second course. 
It is difficult to avoid the inclusion of items 
in a general test which do not, by their very 
wording, favor one or the other area. 

Teachers tend to identify items in terms of 
subject areas even though they understand that 
the items stress method rather than content. 
In terms of public relations and the acceptance 
of the test for wide use, this factor cannot be 
overemphasized. 

It became increasingly clear that teachers 
(in their belief that each area has certain special 
inalienable benefits) desired certain areas to be 
represented. For instance, Biology teachers 
wanted genetics, photosynthesis, etc.; Physics 
teachers, electricity, sound, etc. 


TWO TESTS DEVELOPED 


In line with these notions, two tentative ex- 
perimental tests were developed. One of these, 
Test I, a General Test, consisted of 50 general 
items without regard to subject-matter fields. 
In Test I, approximately 20 questions were in 
the area of Methods of Science, 10 in the area 
of Interpretation of Science Reading Materials, 
10 in Application of Principles of Science, and 
10 in Analysis of Observation and Experimental 
Data. As can be expected, the categories over- 
lapped. 

A second test consisted of two parts: Part 
I was made up of general items; Part II was 
made up of separate Physics, Chemistry, and 
Biology sections. In these separate sections the 
items had strong subject-matter loading. 

Examples of types of items to be found in 
Tests I and II are printed on pages 230 and 233. 


These are not in final edited form, nor validated, 
and are fully expendable. However, rough pre- 
testing data showed them to be useful as fodder 
for the construction of a test which fits the 
criteria stated on page 232. 

It is hoped that those who examine the items 
will do so in the realization that they indicate 
the direction, and the direction only, which the 
committee considers promising. As can be sur- 
mised, there are many problems which face us. 
Some of these are clearly apparent and have 
to do with the construction of items and the 
extent of subject-matter loading. Others deal 
with our ability to interpret clearly the per- 
vasive aims in science education, the central 
purpose of programs in general education, and 
the vigor and tenacity of both. Other problems, 
large and small, will surely occur. 


PROGRAM FOR THE FUTURE 


It is relatively easy for those of us trained 
in science to escape into experiment. Accord- 
ingly, a small subcommittee has been appointed 
to carry on the work of producing a science 
test which will bear the brunt of validating 
procedure. The present intention is to produce 
a test which might eventually supplement the 
Biology, Chemistry, and Physics tests in the 
Board’s series; which, in its greater part, will 
cut across subject-matter lines; and which will 
attempt to measure literacy in science. It should 
be emphasized that the whole project in its 
present state of development is susceptible to 
the widest modification should experience and 
experiment indicate such a course. 

In fact, with the preliminary work of test con- 
struction behind them, the first task of the sub- 
committee will be to test the hypothesis upon 
which the larger committee has been working, 
namely, that students who are successful in an 
examination designed primarily to determine 
their understanding of the strategy and tactics 
of science will be successful in college science. 
Whether or not a valid and reliable instrument 
can be invented to do what is proposed will ap- 
pear from the evidence of further investigation. 


[ 234 ] 














» © 


The Scholastic Aptitude ‘Test—Items, 


Scores, and Coaching 
Henry S. Dyer 


By several standards, the SAT has come to 
be the most important test in the Board’s pro- 
gram. It has had twenty-six years of con- 
tinuous use. It is now taken each year by up- 
wards of 70,000 candidates—more than twice 
as many as take any other test offered by the 
Board. Many colleges require it for admission 
and rely heavily on the scores in deciding which 
candidates shall get the nod. Numerous studies 
can be cited to show that it has substantial 
value in the prediction of college success. In 
short it is widely accepted and it really works. 
Why not leave well enough alone? The answer 
of course is obvious: Some of us think that, by 
taking advantage of recent developments in 
testing, the SAT, good as it is, can be made 
considerably better. 

The kind of research that is contemplated 
and just now getting under way may strike you 
as unbelievably slow. Partial and very prelim- 
inary results will not be forthcoming until at 
least a year from now, and firm conclusions are 
not apt to be in sight until the fall of 1953. If 
any major revisions appear to be indicated, 
they will not be incorporated in the test for 
candidates until 1955. This snail’s pace is 
characteristic of any test development that re- 
quires empirical proof of the usefulness of any 
new devices before they are put into effect. The 
goal is to make the SAT more predictive of 
success in more colleges. Any change made in 
the test must demonstrate beyond doubt that 
it will bring us closer to this goal. 





Henry S. Dyer is Director of the Office of Tests 
of Harvard University and a research con- 
sultant to the College Entrance Examination 


Board. 


Three separate studies are now in progress. 
Although they differ in angle of approach, they 
are all aimed at one central question: How 
can the Scholastic Aptitude Test be so con- 
structed that every part of it will make a 
maximum contribution to the prediction of 
academic performance in a variety of colleges? 


DOES COACHING HURT THE TEST? 


The first study I shall discuss has come to be 
known as the Coachability Study. It rests on 
the hypothesis that some parts of the present 
test may be failing to make a maximum con- 
tribution because their predictive power is 
vitiated by various kinds of specific training 
that some students receive. The ultimate pur- 
pose of this study is to classify the items of 
the test into three possible types: (1) those 
items that are unaffected by any kind of special 
training such as word drill, intensive practice on 
similar tests, and the like; (2) those items 
which, although affected by special training, do 
not thereby lose their predictive power; and 
(3) those that do lose their predictive power 
because of the special training some students 
get. 

The first type will probably prove to be so 
rare as to be practically inconsequential. It is 
the last two types that are likely to prove of 
particular interest. It is no secret that some 
schools make a special effort to get their stu- 
dents ready for the saT. We shall try to find 
out what this sort of thing does to the test and 
also what it does to the students. Insofar as the 
test contains items of the kind I have labeled 
(2), the special training students receive is 
all to the good, for such training not only raises 
their test scores but, by definition, also in- 
creases their ability to do college work. On the 


[ 235 ] 














other hand, insofar as the test contains items 
of type (3), the special training is all to the 
bad, because it only inflates the students’ scores 
without doing them any permanent good. 

To get the necessary data for this study we 
gave the test this past September to two groups 
of twelfth grade students all of whom, we have 
reason to believe, are headed for college. One 
group will receive special training designed to 
raise the test scores. The other group will re- 
ceive no such training. In March all members of 
both groups will take the SAT at the regular 
administration. At that time the relative gains 
of the two groups on each part of the test will 
tell us some of what we want to know, but not 
until we have been able to compare the college 
performance of the two groups at the end of 
next year will we be able to classify the items 
into the three types I have described above. 

The results of the Coachability Study should 
be interesting in a number of ways. They are 
also apt to have some serious implications for 
Board policy, but I think we can leave that 
problem until the data force us to consider it. 


WILL NEW MATERIAL IMPROVE TEST! 


The second study rests on the hypothesis that 
there are some strictly intellectual functions 


which are important for college work and which 
could be, but are not now, sampled by the sar. 
For years the psychometricians have been 
saying that the reason aptitude tests do not 
predict future performance better than they do 
is because personality factors get in the way. 
If we could only develop some decent measures 
of things like motivation and emotional sta- 
bility, we could, it is claimed, improve our cor- 
relations a good deal. This theory seems entirely 
sound, and I go along with it one hundred per 
cent, but the truth of the matter is that we do 
not yet have usable measures of motivation 
and emotional stability. On the other hand, 
recent studies strongly suggest that there are 
a number of areas of purely intellectual activity 
that are measurable and that the SAT is not 
measuring. We think that if we can get some 


of these new measures into the SAT, we may 
be able to improve its predictive power. 

I shall refer to this part of the research as the 
New Item Study. It breaks down into two sec- 
tions. The first section of the New Item Study 
got under way last January and March when 
we quietly slipped some new material into the 
regular Scholastic Aptitude Test. Lest you be 
shocked by the deviousness of this procedure, 
I hasten to explain that the same sort of thing 
has been going on for years. Recently, it has 
been regular practice to set aside one-sixth 
of the test for the trial of new items. This trial 
material does not enter into the candidate’s 
score, but by having him wrestle with it under 
standard testing conditions, the Board is able to 
secure the information necessary to build up a 
pool of items for future use. 

A considerable difference exists in the use 
that has been made of the trial section in the 
past and the use that was made of it last year. 
Formerly, the material to be tried out was close- 
ly similar to that in the main body of the test. 
Last spring the trial material consisted of seven 
radically new types of items. These new items 
were distributed among the test booklets in such 
a way that approximately one-seventh of the 
candidates was exposed to items of one type, 
another seventh was exposed to items of a 
second type, and so on. What we expect to do is 
to compare the scores made by students on each 
new series of items with their subsequent suc- 
cess in various aspects of college work and also 
with the regular Verbal and Mathematical Ap- 
titude scores. From such an analysis we should 
get a preliminary notion of the kinds of new 
items that are likely to give a substantial boost 
to the predictive power of the test. 


TRY-OUT ON 5,300 FRESHMEN 

Since, however, each student last spring 
was exposed to only one new item-type, we 
shall not be able to tell, from the kind of analy- 
sis I have just referred to, how well the seven 
item-types would work in combination. It was 
for this reason that we undertook the second 


_[ 236 | 

















section of the New Item Study. This began 
with the administration this fall of several com- 
binations of the new items to the freshman 
classes in nine colleges selected to form a rep- 
resentative sample of the Board’s membership. 
In this part of the enterprise we included not 
only the seven item-types tried out in the 
January and March series, but also four addi- 
tional types that it was inconvenient to give 
at that time. The fall tryout involved some 
5,300 freshmen. Next summer we shall be ha- 
rassing registrars for the records of these 
people, so that we can compare their test per- 
formance with their course performance. It 
seems reasonable to suppose that from this ex- 
tensive trial of eleven new types of items—all 
of which for one good reason or another look 
promising—we should be able to find at least 
two or three that will make an appreciable im- 
provement in the test. 


ITEMS MUST PROVE THEMSELVES 


Every step of the way, as you can see, we 
are tying the development of the test to the 
actual performance of students in college. We 
are insisting that before any new type of item 
can get into the regular SAT, it must demon- 
strate that it predicts some aspect of academic 
achievement which is not being predicted by 
items already in the test. It is not sufficient 
that an item may seem promising on its face 
or that it has worked well in somebody else’s 
test or that it fits in with some plausible theory 
about the functioning of the human mind. To 
get into the test it must prove that it has prac- 
tical value for identifying the people who turn 
out to be good students in college. It is this 
emphasis, I believe, that marks off the present 
effort at test development from most of its pred- 
ecessors. 

The easiest way to acquaint you with the 
nature of the new material would be to show 
you the tests themselves, but for good and 
sufficient reasons such a procedure is for- 
bidden. The best I can do is to describe in a 
general way some of the functions that we think 


the new item-types will be able to measure. 

The first type looks on the face of it like the 
usual test in reading comprehension, but we 
hope that it is considerably more. It presents 
the student with a fairly long, closely reasoned 
passage of political argument and then re- 
quires him to draw inferences from this 
material. It forces him to come to grips with a 
considerable body of facts and related argu- 
ments, to understand it, to hold in mind the re- 
lations that exist within it, and to reason se- 
quentially about it. 

The second type of item is like the first 
except that the content of the passage is of a 
scientific nature. 

The third type we call a test of inductive 
reasoning. It requires the student to examine 
a body of data, to hypothesize rules or prin- 
ciples to explain or structure them, and to test 
the hypotheses. 

The fourth type is known as the sufficiency 
of data test. Here the student is presented with 
data that may or may not be needed in solving 
a problem. His task is to indicate whether the 
data are sufficient or superfluous for arriving 
at a solution. 

The fifth type we are calling a test of in- 
tegration. It gives the student a series of rules 
underlying an artificial language and then it 
requires him to hold in mind several of the 
rules at once in order to solve a number of word 
problems. The test looks like a language ap- 
titude test, but we hope that it is tapping a 
function that may be important in a variety of 
disciplines. 


ADAPTED FROM EIGHT YEAR STUDY 


The sixth type is the familiar interpretation 
of data test adapted from the Eight Year Study. 
The student is confronted with several kinds 
of statistical and experimental data and is 
asked to reason from these data to appropriate 
conclusions. 

The seventh type we are calling a test of 
visualization. It is supposed to test the stu- 
dent’s ability to see in his mind’s eye how a 


[ 237 ] 








familiar object would look after a number of 
operations had been performed upon it. 

The eighth type is known as a test of best 
arguments. This is another reasoning test in 
which the student is presented with various 
issues and is asked to identify those arguments 
on each side of each question that hold water 
and those that do not. 

The ninth type is a simple memory test in 
which the student is given various kinds of 
material to examine for five minutes and then, 
nearly two hours later, after taking other tests 
in between, he is tested to see how much of the 
material he remembers. This is frankly a test of 
rote memory, and we think it may work be- 
cause although most college teachers sneer at 
rote memory, many of the tests they give still 
demand a lot of it. 

The tenth type is a test of perceptual speed 
and carefulness. It is a short, highly speeded 
test that tries to find out how rapid and ac- 
curate the student is in perceiving likenesses 
and differences in simple visual material. 

The eleventh and last type is called a general 
information test, but its real purpose is to dis- 
cover, if possible, where the dominant intel- 
lectual interests of the student lie by sampling 
his familiarity with common facts in the physi- 
cal sciences, the biological sciences, the social 
sciences, literature, and the arts. 

These new items have come from many 
different sources. Most of them have been found 
useful in other tests, and on many of them there 
is already some evidence that they work as 
predictors. In other words, we are not going 
it blind. An extraordinary amount of ingenu- 
ity, resourcefulness, and good common sense 
has gone into the preparation of the new items. 
Full credit for them goes to the highly com- 
petent and imaginative people who form the 
research and test construction staffs of the 
Educational Testing Service. 


ARE VERBAL AND MATH SCORES ENOUGH? 


Finally we come to the last study in the 
series. Its underlying hypothesis is that the 


two scores—Verbal and Mathematical—which 
the test now yields may not be sufficient to meet 
the needs of all the different colleges that use 
the test. All colleges make certain common 
demands no doubt upon the mental powers of 
their students, and any student who lacks the 
power to meet the common minimum is un- 
likely to succeed in college anywhere. On the 
other hand, if the student is above this point, 
it seems probable that he could fail at one col- 
lege and do very well at another where the de- 
mands are different in kind. The obvious 
example is the difference between the engineer- 
ing school and the liberal arts college: the stu- 
dent who cannot master sufficient mathematics 
to handle engineering may nevertheless be able 
to cope successfully with history and literature. 


DIFFERENCES IN STANDARDS 

This kind of differentiation among colleges 
may go even further. A student who fails at 
one liberal arts college might have succeeded at 
another—not because the standards of the one 
are higher than the standards of the other, but 
because of a qualitative difference in the stand- 
ards of the two colleges. College A, for ex- 
ample, may require more courses in modern 
languages and mathematics, College B may 
require courses in modern languages but none 
in mathematics, College C may require neither 
modern languages nor mathematics, but may 
prescribe a number of courses in general edu- 
cation. The differences may be even more subtle 
than these: they may be differences in course 
organization, differences in “academic climate” 
which are not apparent on the surface but which 
may vary enormously in their effects upon 
students. 

If this situation exists, then it is unreasonable 
to suppose that aptitude for college work is 
any single thing. Rather there is one combina- 
tion of qualities that constitutes aptitude for 
successful work at College A, another combina- 
tion that constitutes aptitude for successful work 
at College B, and so on. The present SAT recog- 
nizes only two such qualities—verbal ability 


[ 238 ] 














and mathematical ability—with the result that 
the number of combinations that can be 
measured by it is highly restricted. Recent 
studies suggest that there are many kinds of 
verbal and mathematical ability as well as 
other abilities that one would label neither 
verbal nor mathematical. 


COMBINATIONS OF SCORES 


All of this leads to the notion that if we could 
break down the verbal and mathematical scores 
into sub-scores based on different parts of each 
section, and if we could add one or two other 
scores indicative of other kinds of ability for 
college work, we should be better able to pro- 
vide each college using the tests with a com- 
bination of scores best fitted to its own peculiar 
requirements. 

The data from previous studies on the valid- 
ity of the SAT at different colleges is now being 
investigated to see whether anything can be 
gained from subdividing the scores on the test 
as it exists today, but it seems likely that no 
very fruitful results will be obtained until we 
have had a chance to examine the far more ex- 
tensive and varied data that will emerge from 
the experimental tests given in the fall. In any 
case, it is certainly within the realm of possi- 
bility that we shall eventually come to the 
Board with the recommendation that five or 
six aptitude scores should be reported rather 
than the present two. It would then be possible 
for each college to determine for itself the com- 
bination of scores that would give the most 
accurate prediction for its own candidates. The 
chances are that such a procedure would raise 
materially the predictive power of the test for 
all colleges and would tend to correct the 
present situation in which we find that the SAT 
works much better in some places than in 
others. 

It should be obvious that this whole enter- 
prise is completely dependent upon the sym- 
pathetic cooperation of the colleges and their 
students. I am happy to report that we are 
getting this*cooperation in generous arnounts. 


Elected Officers and 


Executive Committee 


Elected Officers 
Chairman: President Katharine E. McBride, 
Bryn Mawr College 
Vice-Chairman: Provost Samuel T. Arnold, 
Brown University 


Custodians: 
Dr. Claude M. Fuess, Chestnut Hill, Mass., 
Chief Custodian 
Mr. John Irvine Kirkpatrick, University of 
Chicago 
Vice-President Archibald MacIntosh, Haver- 
ford College 


Executive Committee 

Provost Samuel T. Arnold, Brown University, 
Chairman (ex officio) 

Dr. William H. Cornog, Central High School, 
Philadelphia 

Dean Margaret T. Corwin, New Jersey College 
for Women, Rutgers University 

Vice-Chancellor Finla G. Crawford, Syracuse 
University 

Miss Rosamond Cross, Baldwin School 

Dean Frank J. Gilliam, Washington and Lee 
University 

Dr. Richard M. Gummere, Harvard University 

Mr. Allan V. Heely, Lawrenceville School 

Dean Frank R. Kille, Carleton College 

President Katharine E. McBride, Bryn Mawr 
College (ex officio) 

Professor Edward S. Noyes, Yale University 

Professor B. A. Thresher, Massachusetts Insti- 
tute of Technology 

Mr. Herbert H. Williams, Cornell University 


William B. Schrader and Charles T. Myers 
of the Educational Testing Service were the 
co-authors of the May, 1951, Review article, 
“Making Test Scores Meaningful.” The 
article was incorrectly ascribed to Mr. 
Schrader alone. 


[ 239 ] 





Board Publications 


Annual Report of the Director, 1950. De- 
scription of Board activities, lists of mem- 
bers, examiners, readers. Contains a sec- 
tion, “Data for Interpreting the Tests.” 
83 pages. $.50. 


Bulletin of Information and Sample Tests. 
Advice to candidates and parents, dates of 
examinations, registration and fees, de- 
scription of tests, sample questions. 56 
pages. Free. 


College Board Review. News and research 
of the College Entrance Examination 
Board. Subscription: one year, $.50; two 
years, $1. Hard-covered, looseleaf binders 
for the Review stamped in gold leaf are 
available at cost, $2. 


College Handbook. Descriptions of each 
of the 134 member colleges—their study 
programs, admission terms, freshman year 
expenses, scholarships and other aid, and 
to whom to write for information. A spe- 
cial section on national scholarship pro- 
grams. Also, listings of colleges by sex of 
students, region and enrollment, and a 


table of Army, Navy, and Air R.O.T.C. 
units. 292 pages. $1. 


The College Board, Its First Fifty Years, 
by Claude M. Fuess. “The full story of the 
College Entrance Examination Board’s 
contribution to twentieth-century educa- 
tion in America.” Published by Columbia 
University Press, New York, 1950. 224 
pages. $2.75. 


Order from the Secretary, College En- 
trance Examination Board, 425 West 117 
Street, New York 27, N.Y. 


Dates, Tests, Fees: 1951-52 


EXAMINATION DATES 


December 1, 1951 
January 12, 1952 
March 15, 1952 
May 17, 1952 
August 13, 1952 


EXAMINATION PROGRAMS* 
Morning Program 


Scholastic Aptitude Test 
(Verbal Section) 
(Mathematical Section) 


Afternoon Program 

(a maximum of three afternoon tests) 
English Composition Chemistry 
Social Studies Physics 
French Reading Intermediate 

Mathematics 
Advanced 

Mathematics 
Biology Spatial Relations 


Pre-Engineering Science Comprehension 


German Reading 
Latin Reading 
Spanish Reading 


EXAMINATION FEES 
Morning Program and 
Afternoon Program 
Morning Program only 
Afternoon Program only 





*The College Transfer Test, for students transferring 
from one college to another, is offered on the same dates 
and at the same centers as the College Entrance Tests. It 
is administered in the morning. The fee is $6. Bulletins 
of Information and application blanks for the College 
Transfer Test will be sent upon request. Address the Col- 
lege Entrance Examination Board, Box 592, Princeton, 
N. J., or Box 9896, Los Feliz Station, Los Angeles 27, Cal. 


a 








